Skip to content

Support PaddlePaddle with compatible API#1642

Closed
SigureMo wants to merge 8 commits intoflashinfer-ai:mainfrom
PFCCLab:support-paddlepaddle-with-compatible-api
Closed

Support PaddlePaddle with compatible API#1642
SigureMo wants to merge 8 commits intoflashinfer-ai:mainfrom
PFCCLab:support-paddlepaddle-with-compatible-api

Conversation

@SigureMo
Copy link
Copy Markdown

@SigureMo SigureMo commented Sep 5, 2025

We are PaddlePaddle contributors working on a PyTorch compatibility layer aimed at making it significantly easier for PyTorch ecosystem libraries to run on Paddle. See context: #1563

Summary

  • This PR introduces a minimal, opt-in compatibility path so third-party projects such as flashinfer can be used with Paddle with very small changes.
  • The approach is intentionally minimal and opt-in to avoid breaking upstream behavior for existing PyTorch users.

Design

  • C++ / CUDA layer: provide an adapter that is fully compatible with the PyTorch C API surface (ATen / c10 / torch)1. This allows third-party libraries that call into PyTorch C++/CUDA APIs to instead invoke Paddle's C++/CUDA implementation via the adapter.
  • Python layer: reorganize a small compatibility layer so that Paddle's Python API matches PyTorch's API shape as closely as possible (we avoid reproducing PyTorch-specific internals like TorchVersion). The goal is that Python code can do import paddle as torch and run with minimal or no source changes.
  • Import proxy: provide paddle.compat.enable_torch_proxy()2 which makes import torch actually load paddle. This removes the need for import paddle as torch in most cases and keeps changes non-invasive.

Usage (example)

  • Install (build with compatibility enabled)

    PADDLE_COMPATIBLE_API=1 pip install -v --no-build-isolation .
  • Runtime example

    # example.py
    import paddle
    
    paddle.compat.enable_torch_proxy()  # enable proxy before import torch
    
    import flashinfer
    
    # use ops in flashinfer ...
    PADDLE_COMPATIBLE_API=1 python example.py

Why this is opt-in

  • We added a simple check for the environment variable PADDLE_COMPATIBLE_API. When set, the compatibility hooks and small source adjustments are enabled. This keeps the default behavior unchanged for regular PyTorch or Paddle users.

Small changes requested in flashinfer

  • JIT-related logic: some JIT code in flashinfer assumes PyTorch's directory layout. We do not aim to mirror directory structure 1:1. We request a small refactor to decouple the logic from the exact torch package file layout (make paths configurable or resolve modules by import names).
  • setup.py / AOT build: during AOT compilation setup.py currently does import torch. For compatibility builds we need the build to perform paddle.compat.enable_torch_proxy() early (before import torch), or otherwise provide a small hook so the build imports load paddle instead.
  • To make these changes backward-compatible, our patch adds an environment-variable driven path (PADDLE_COMPATIBLE_API) inside flashinfer; if present, we enable the compatibility adjustments only in that mode.

Would these minimal, environment-gated changes be acceptable to the flashinfer maintainers?

What we tested

  • We tested the flashinfer.fused_moe.cutlass_fused_moe interface. With the compatibility mode enabled and some additional Python-side compatibility work in progress, we successfully ran fp16 unit tests for that interface.
  • We plan to incrementally expand test coverage and run more of flashinfer's unit tests in CI as part of further collaboration.

Next steps (proposed)

  • If you’re open to collaboration, we can:
    • Open a PR with the smallest possible changes to flashinfer (clearly marked and gated).
    • Add flashinfer tests into PaddlePaddle's CE system (run daily under PADDLE_COMPATIBLE_API=1) and gradually increase coverage.

Thank you for reviewing this PR — we welcome your feedback on the minimal integration approach and are ready to iterate on the branch or make any changes you prefer.

Footnotes

  1. https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/phi/api/include/compat

  2. https://github.com/PaddlePaddle/Paddle/blob/b38a9503d4f3f7c84af44a6399bb76ee043e7616/python/paddle/compat.py#L110

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @SigureMo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a strategic compatibility layer to bridge PyTorch-dependent projects with PaddlePaddle. Its primary goal is to facilitate the adoption of libraries like FlashInfer within the PaddlePaddle ecosystem by providing a seamless, opt-in mechanism that intelligently adapts build processes and API calls without disrupting existing workflows. The changes are designed to be non-invasive and environment-gated, ensuring flexibility and stability for both PyTorch and PaddlePaddle users.

Highlights

  • PaddlePaddle Compatibility: Introduces a minimal, opt-in compatibility layer to enable PyTorch ecosystem libraries, such as FlashInfer, to run on PaddlePaddle with minimal code changes.
  • Design Approach: The compatibility is achieved through a C++/CUDA layer adapter for PyTorch's C API surface (ATen/c10/torch), a Python layer to match PyTorch's API shape (allowing import paddle as torch), and an import proxy (paddle.compat.enable_torch_proxy()) that makes import torch load paddle.
  • Opt-in Mechanism: The compatibility features are activated by setting the PADDLE_COMPATIBLE_API environment variable, ensuring that default behavior remains unchanged for regular PyTorch or Paddle users.
  • Build System Adjustments: Modifies the JIT compilation logic and setup.py to conditionally include PaddlePaddle-specific include paths and linker flags, or PyTorch ones, based on the compatibility mode.
  • Initial Testing: Successfully tested the flashinfer.fused_moe.cutlass_fused_moe interface with fp16 unit tests under the new compatibility mode.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a compatibility layer for PaddlePaddle, allowing third-party PyTorch ecosystem libraries to run on Paddle with minimal changes. The changes involve adapting C++/CUDA APIs and reorganizing the Python API to match PyTorch's structure. The integration is opt-in, controlled by the PADDLE_COMPATIBLE_API environment variable. The review focuses on correctness and maintainability, particularly concerning the conditional logic for PaddlePaddle compatibility and the modifications to the build process.

Comment thread setup.py
Comment on lines 75 to +77
if enable_aot:
if use_paddle_compatible_api():
import paddle
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Consider adding a check to ensure paddle is importable before calling paddle.compat.enable_torch_proxy(). If paddle is not installed or available in the environment, this could lead to an ImportError and break the build process. A try...except block can be used to handle this scenario gracefully.

Suggested change
if enable_aot:
if use_paddle_compatible_api():
import paddle
if use_paddle_compatible_api():
try:
import paddle
paddle.compat.enable_torch_proxy()
except ImportError:
print("PaddlePaddle is not installed. Skipping paddle.compat.enable_torch_proxy().")

Comment thread flashinfer/jit/cpp_ext.py
Comment on lines +71 to +77
if use_paddle_compatible_api():
system_includes.extend(
[
"$torch_home/include",
"$torch_home/include/torch/csrc/api/include",
]
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The system_includes.extend method is called within the if block, but the system_includes list already contains some default paths. This could lead to duplicate include paths if use_paddle_compatible_api() returns True, potentially causing issues during compilation. Consider adding the default paths within the else block to avoid duplication.

Alternatively, you can initialize system_includes as an empty list and populate it entirely within the if and else blocks to ensure no overlap.

        system_includes = [
            sysconfig.get_path("include"),
            "$cuda_home/include",
            jit_env.FLASHINFER_INCLUDE_DIR.resolve(),
            jit_env.FLASHINFER_CSRC_DIR.resolve(),
        ]
        if use_paddle_compatible_api():
            system_includes.extend(
                [
                    "$torch_home/include",
                    "$torch_home/include/torch/csrc/api/include",
                ]
            )
        else:
            system_includes.extend(
                [
                    "$torch_home/include",
                    "$torch_home/include/paddle/phi/api/include/compat",
                    "$torch_home/include/paddle/phi/api/include/compat/torch/csrc/api/include",
                ]
            )

Comment thread flashinfer/jit/cpp_ext.py
Comment on lines 128 to 131
ldflags = [
"-shared",
"-L$torch_home/lib",
"-L$cuda_home/lib64",
"-lc10",
"-lc10_cuda",
"-ltorch_cpu",
"-ltorch_cuda",
"-ltorch",
"-lcudart",
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ldflags list is initialized with "-shared" and "-lcudart" regardless of the use_paddle_compatible_api() condition. This could lead to redundancy or conflicts if the subsequent extend calls also include -shared or -lcudart. Consider initializing ldflags as an empty list and adding these flags conditionally within the if and else blocks to avoid potential issues.

    ldflags = []
    if use_paddle_compatible_api():
        ldflags.extend(
            [
                "-shared",
                "-L$torch_home/lib",
                "-L$cuda_home/lib64",
                "-lc10",
                "-lc10_cuda",
                "-ltorch_cpu",
                "-ltorch_cuda",
                "-ltorch",
                "-lcudart",
            ]
        )
    else:
        ldflags.extend(
            [
                "-shared",
                "-L$torch_home/libs",
                "-L$torch_home/base",
                "-L$cuda_home/lib64",
                "-lpaddle",
                "-lphi",
                "-lphi_core",
                "-lphi_gpu",
                "-lcommon",
                "-lcudart",
            ]
        )

Comment thread flashinfer/utils.py Outdated
Comment on lines 476 to 485
return flashinfer.jit.gen_jit_spec(
"logging",
[
jit_env.FLASHINFER_CSRC_DIR / "logging.cc",
flashinfer.jit.env.FLASHINFER_CSRC_DIR / "logging.cc",
],
extra_include_paths=[
jit_env.SPDLOG_INCLUDE_DIR,
jit_env.FLASHINFER_INCLUDE_DIR,
flashinfer.jit.env.SPDLOG_INCLUDE_DIR,
flashinfer.jit.env.FLASHINFER_INCLUDE_DIR,
],
).build_and_load()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using pathlib.Path.joinpath instead of / for constructing paths. This is more platform-independent and readable. For example, flashinfer.jit.env.FLASHINFER_CSRC_DIR.joinpath("logging.cc").

Suggested change
return flashinfer.jit.gen_jit_spec(
"logging",
[
jit_env.FLASHINFER_CSRC_DIR / "logging.cc",
flashinfer.jit.env.FLASHINFER_CSRC_DIR / "logging.cc",
],
extra_include_paths=[
jit_env.SPDLOG_INCLUDE_DIR,
jit_env.FLASHINFER_INCLUDE_DIR,
flashinfer.jit.env.SPDLOG_INCLUDE_DIR,
flashinfer.jit.env.FLASHINFER_INCLUDE_DIR,
],
).build_and_load()
return flashinfer.jit.gen_jit_spec(
"logging",
[
flashinfer.jit.env.FLASHINFER_CSRC_DIR.joinpath("logging.cc"),
],
extra_include_paths=[
flashinfer.jit.env.SPDLOG_INCLUDE_DIR,
flashinfer.jit.env.FLASHINFER_INCLUDE_DIR,
],
).build_and_load()

@yzh119
Copy link
Copy Markdown
Collaborator

yzh119 commented Sep 5, 2025

Hi @SigureMo we plan to go with tvm-ffi to replace the current PyTorch bindings (wip in #1641), it should satisfy what paddle need (can you double checking it?)

Cc @tqchen

@SigureMo
Copy link
Copy Markdown
Author

SigureMo commented Sep 7, 2025

Hi @yzh119 — thanks for the work here! I only learned about the recent TVM FFI efforts in the last couple of days. TVM FFI is indeed an excellent FFI solution for the ML systems — thanks to you and @tqchen for driving this.

From what I see, TVM FFI can nicely decouple flashinfer from PyTorch by providing a framework-agnostic binding layer, which aligns well with the goals of our compatibility approach. This should remove a lot of pain points in our custom C++ operator ecosystem — at minimum we wouldn’t need to worry about C++ ABI/operator registration compatibility anymore. I did a quick look into the implementation and it seems we would likely only need a small adaptation for CUDA stream handling in TVM FFI (see: https://github.com/apache/tvm/blob/a819115375568e52f9d2d7376cdbb0a23346c3cb/ffi/python/tvm_ffi/cython/function.pxi#L110-L124). So I’m looking forward to your refactor.

Separately, TVM FFI as a more general, framework-agnostic custom-op solution opens up additional possibilities and could offer more options for our ecosystem compatibility strategy. Do you have any plans to promote or adopt TVM FFI in projects beyond flashinfer? If so, that could help more custom-op projects decouple from the PyTorch ecosystem and move toward a framework-agnostic custom-op ecosystem. @tqchen

@SigureMo SigureMo marked this pull request as draft September 7, 2025 20:48
@tqchen
Copy link
Copy Markdown
Collaborator

tqchen commented Sep 7, 2025

thanks @SigureMo , yes, we do plan to bring tvm ffi as an independent project that benefit all, we are still at the bring up stage so didn't communicate broadly, but yes the goal is to make it a general project that can be used across all deep learning frameworks, compilers, and libraries

@SigureMo SigureMo closed this Sep 17, 2025
@yzh119
Copy link
Copy Markdown
Collaborator

yzh119 commented Sep 27, 2025

#1641 is merged, @SigureMo would you mind checking whether it's helpful for paddle compatibility?

@SigureMo
Copy link
Copy Markdown
Author

#1641 is merged, @SigureMo would you mind checking whether it's helpful for paddle compatibility?

@yzh119 Thanks for the work on #1641! I can confirm the C++ layer no longer depends on PyTorch after that change, which removes the adapter maintenance we were carrying on our side—really appreciate it.

I did notice the Python JIT workflow still references torch headers and some torch-specific compile flags.

system_includes = [
sysconfig.get_path("include"),
"$torch_home/include",
"$torch_home/include/torch/csrc/api/include",
"$cuda_home/include",
"$cuda_home/include/cccl",
tvm_ffi.libinfo.find_include_path(),
tvm_ffi.libinfo.find_dlpack_include_path(),
jit_env.FLASHINFER_INCLUDE_DIR.resolve(),
jit_env.FLASHINFER_CSRC_DIR.resolve(),
]
system_includes += [p.resolve() for p in jit_env.CUTLASS_INCLUDE_DIRS]
system_includes.append(jit_env.SPDLOG_INCLUDE_DIR.resolve())
common_cflags = [
"-DTORCH_EXTENSION_NAME=$name",
"-DTORCH_API_INCLUDE_EXTENSION_H",
]
if not sysconfig.get_config_var("Py_GIL_DISABLED"):
common_cflags.append("-DPy_LIMITED_API=0x03090000")
common_cflags += torch_get_pybind11_abi_build_flags()
common_cflags += _get_glibcxx_abi_build_flags()

Do you plan to remove those as well?

On the E2E validation: we already landed the Paddle prerequisites (PaddlePaddle/Paddle#75193 and PaddlePaddle/Paddle#75205), so I’m optimistic flashinfer will run on Paddle as smoothly as it does on PyTorch. I’ll run the verification soon—likely right after the holiday.

@yzh119
Copy link
Copy Markdown
Collaborator

yzh119 commented Sep 27, 2025

Do you plan to remove those as well?

Yes most of them are no longer required, updated in #1795

yzh119 added a commit that referenced this pull request Sep 29, 2025
<!-- .github/pull_request_template.md -->

## 📌 Description

The codegen logic for pytorch and tvm should unify after #1641 , and
this PR cleans up the related codegen functions in tvm_bindings.

Other changes:
1. update tvm-ffi to 0.1.0b11 to incorporate
apache/tvm-ffi#67 and
apache/tvm-ffi#68
2. rename of source files: `_ops.cu` and `_pybind.cu` renamed to
`_binding.cu`
3. remove torch related header include/library linking in ninja files
(#1642 (comment))
4. remove the use of `use_torch_stream` in unittests, they are no longer
required after apache/tvm-ffi#68

## 🔍 Related Issues

#1641 

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [ ] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [ ] I have installed the hooks with `pre-commit install`.
- [ ] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [ ] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).

## Reviewer Notes

cc @MasterJH5574 please let us know what changes do we need to make to
help you bump to the latest version of flashinfer in MLC.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants